113 research outputs found

    Regular configurations and TBR graphs

    Get PDF
    PhD 2009 QMThis thesis consists of two parts: The first one is concerned with the theory and applications of regular configurations; the second one is devoted to TBR graphs. In the first part, a new approach is proposed to study regular configurations, an extremal arrangement of necklaces formed by a given number of red beads and black beads. We first show that this concept is closely related to several other concepts studied in the literature, such as balanced words, maximally even sets, and the ground states in the Kawasaki-Ising model. Then we apply regular configurations to solve the (vertex) cycle packing problem for shift digraphs, a family of Cayley digraphs. TBR is one of widely used tree rearrangement operationes, and plays an important role in heuristic algorithms for phylogenetic tree reconstruction. In the second part of this thesis we study various properties of TBR graphs, a family of graphs associated with the TBR operation. To investigate the degree distribution of the TBR graphs, we also study -index, a concept introduced to measure the shape of trees. As an interesting by-product, we obtain a structural characterization of good trees, a well-known family of trees that generalizes the complete binary trees

    On Patchworks and Hierarchies

    Full text link
    Motivated by questions in biological classification, we discuss some elementary combinatorial and computational properties of certain set systems that generalize hierarchies, namely, 'patchworks', 'weak patchworks', 'ample patchworks' and 'saturated patchworks' and also outline how these concepts relate to an apparently new 'duality theory' for cluster systems that is based on the fundamental concept of 'compatibility' of clusters.Comment: 17 pages, 2 figure

    Spaces of phylogenetic networks from generalized nearest-neighbor interchange operations

    Get PDF
    Phylogenetic networks are a generalization of evolutionary or phylogenetic trees that are used to represent the evolution of species which have undergone reticulate evolution. In this paper we consider spaces of such networks defined by some novel local operations that we introduce for converting one phylogenetic network into another. These operations are modeled on the well-studied nearest-neighbor interchange (NNI) operations on phylogenetic trees, and lead to natural generalizations of the tree spaces that have been previously associated to such operations. We present several results on spaces of some relatively simple networks, called level-1 networks, including the size of the neighborhood of a fixed network, and bounds on the diameter of the metric defined by taking the smallest number of operations required to convert one network into another.We expect that our results will be useful in the development of methods for systematically searching for optimal phylogenetic networks using, for example, likelihood and Bayesian approaches

    UPGMA and the normalized equidistant minimum evolution problem

    Get PDF
    UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a widely used clustering method. Here we show that UPGMA is a greedy heuristic for the normalized equidistant minimum evolution (NEME) problem, that is, finding a rooted tree that minimizes the minimum evolution score relative to the dissimilarity matrix among all rooted trees with the same leaf-set in which all leaves have the same distance to the root. We prove that the NEME problem is NP-hard. In addition, we present some heuristic and approximation algorithms for solving the NEME problem, including a polynomial time algorithm that yields a binary, rooted tree whose NEME score is within O(log2n) of the optimum

    Structural properties of the reconciliation space and their applications in enumerating nearly-optimal reconciliations between a gene tree and a species tree

    Get PDF
    Introduction: A gene tree for a gene family is often discordant with the containing species tree because of its complex evolutionary course during which gene duplication, gene loss and incomplete lineage sorting events might occur. Hence, it is of great challenge to infer the containing species tree from a set of gene trees. One common approach to this inference problem is through gene tree and species tree reconciliation. Results: In this paper, we generalize the traditional least common ancestor (LCA) reconciliation to define a reconciliation between a gene tree and species tree under the tree homomorphism framework. We then study the structural properties of the space of all reconciliations between a gene tree and a species tree in terms of the gene duplication, gene loss or deep coalescence costs. As application, we show that the LCA reconciliation is the unique one that has the minimum deep coalescence cost, provide a novel characterization of the reconciliations with the optimal duplication cost, and present efficient algorithms for enumerating (nearly-)optimal reconciliations with respect to each cost. Conclusions: This work provides a new graph-theoretic framework for studying gene tree and species tree reconciliations

    Hierarchies from lowest stable ancestors in nonbinary phylogenetic networks

    Get PDF
    The reconstruction of the evolutionary history of a set of species is an important problem in classification and phylogenetics. Phylogenetic networks are a generalization of evolutionary trees that are used to represent histories for species that have undergone reticulate evolution, an important evolutionary force for many organisms (e.g. plants or viruses). In this paper, we present a novel approach to understanding the structure of networks that are not necessarily binary. More specifically, we define the concept of a closed set and show that the collection of closed sets of a network forms a hierarchy, and that this hierarchy can be deduced from either the subtrees or subnetworks on all 3-subsets. This allows us to also show that closed sets generalize the concept of the SN-sets of a binary network, sets which have proven very useful in elucidating the structure of binary networks. We also characterize the minimal closed sets (under set inclusion) for a special class of networks (2-terminal networks). Taken together, we anticipate that our results should be useful for the development of new phylogenetic network reconstruction algorithms

    On the Subgroup Distance Problem

    Get PDF
    We investigate the computational complexity of finding an element of a permutation group H subset S_n with a minimal distance to a given pi in S_n , for different metrics on S_n . We assume that H is given by a set of generators, such that the problem cannot be solved in polynomial time by exhaustive enumeration. For the case of the Cayley Distance, this problem has been shown to be NP-hard, even if H is abelian of exponent two Pinch, 2006. We present a much simpler proof for this result, which also works for the Hamming Distance, the l\_p distance, Lee's Distance, Kendall's tau, and Ulam's Distance. Moreover, we give an NP-hardness proof for the l\_oo distance using a different reduction idea. Finally, we settle the complexity of the corresponding fixed-parameter and maximization problems

    The Combinatorics of Tandem Duplication

    Get PDF
    Tandem duplication is an evolutionary process whereby a segment of DNA is replicated and proximally inserted. The different configurations that can arise from this process give rise to some interesting combinatorial questions. Firstly, we introduce an algebraic formalism to represent this process as a word producing automaton. The number of words arising from n tandem duplications can then be recursively derived. Secondly, each single word accounts for multiple evolutions. With the aid of a bi-coloured 2d- tree, a Hasse diagram corresponding to a partially ordered set is constructed, from which we can count the number of evolutions corresponding to a given word. Thirdly, we implement some subtree prune and graft operations on this structure to show that the total number of possible evolutions arising from n tandem duplications is k=1n(4k(2k+1))\prod_{k=1}^n(4^k - (2k + 1)). The space of structures arising from tandem duplication thus grows at a super-exponential rate with leading order term O(412n2)\mathcal{O}(4^{\frac{1}{2}n^2})
    corecore